Automatic Wordnet Development for Low-Resource Languages using Cross-Lingual WSD

نویسندگان

  • Nasrin Taghizadeh
  • Hesham Faili
چکیده

Wordnet is an effective resource in natural language processing and information retrieval, especially for semantic processing and meaning related tasks. So far wordnet has been constructed in many languages. However, automatic development of wordnet for lowresource languages has not been studied well. In this paper an Expectation-Maximization algorithm is used to train high quality and large scale wordnet for resource-poor languages. The proposed method benefits from cross-lingual word sense disambiguation and develops a wordnet just using a bilingual dictionary and a monolingual corpus. The proposed method has been executed on Persian as a resource-poor language and the resulting wordnet has been evaluated through several experiments. Results show that the induced wordnet has a precision of 90% and recall of 35%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Applying cross-lingual WSD to wordnet development

The automatic development of semantic resources constitutes an important challenge in the NLP community. The methods used generally exploit existing large-scale resources, such as Princeton WordNet, often combined with information extracted from multilingual resources and parallel corpora. In this paper, we show how Cross-Lingual Word Sense Disambiguation can be applied to wordnet development. ...

متن کامل

Five Languages Are Better Than One: An Attempt to Bypass the Data Acquisition Bottleneck for WSD

This paper presents a multilingual classification-based approach to Word Sense Disambiguation that directly incorporates translational evidence from four other languages. The need of a large predefined monolingual sense inventory (such as WordNet) is avoided by taking a language-independent approach where the word senses are derived automatically from word alignments on a parallel corpus. As a ...

متن کامل

Building A Chinese WordNet Via Class-Based Translation Model

Semantic lexicons are indispensable to research in lexical semantics and word sense disambiguation (WSD). For the study of WSD for English text, researchers have been using different kinds of lexicographic resources, including machine readable dictionaries (MRDs), machine readable thesauri, and bilingual corpora. In recent years, WordNet has become the most widely used resource for the study of...

متن کامل

SemEval-2007 Task 01: Evaluating WSD on Cross-Language Information Retrieval

This paper presents a first attempt of an application-driven evaluation exercise of WSD. We used a CLIR testbed from the Cross Lingual Evaluation Forum. The expansion, indexing and retrieval strategies where fixed by the organizers. The participants had to return both the topics and documents tagged with WordNet 1.6 word senses. The organization provided training data in the form of a pre-proce...

متن کامل

A Naïve Bayes Approach to Cross-Lingual Word Sense Disambiguation and Lexical Substitution

Word Sense Disambiguation (WSD) is considered one of the most important problems in Natural Language Processing [1]. It is claimed that WSD is essential for those applications that require of language comprehension modules such as search engines, machine translation systems, automatic answer machines, second life agents, etc. Moreover, with the huge amounts of information in Internet and the fa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Artif. Intell. Res.

دوره 56  شماره 

صفحات  -

تاریخ انتشار 2016